1 General linear regression model

The general equation for linear regression is of the form:
\[y=\theta_0 +\theta_1\times x_1 +... + \theta_n\times x_n\]

By defining a variable \(x_0=1\), we can rewrite this equation as: \[y=\theta_0\times x_0 +\theta_1\times x_1 +... + \theta_n\times x_n\]

Let’s consider a very simple model \(y=\theta_0 +\theta_1\times x_1\) to predict the price of a house based on its size. This model could be described by a line, caracterized by an intercept (\(\theta_0\)) and a slope (\(\theta_1\)). If \(\theta_0\) and \(\theta_1\) are known, we could use this model to predict the price of a house based on its size (called a feature).

2 Coding implementation of the model

2.1 Naive function

Let’s define a simple Python function to do just that:

import numpy as np
def predict(theta, features):
    features = np.concatenate([[1], features]) # add x_0 = 1
    return np.sum([t*f for t,f in zip(theta, features)])

Now, for any pair of \(\theta\) parameters, we can predict the price of any house based on its size:

# let's assign arbitrary values to the parameters of the our linear model
b0, b1 = 3, 8
theta = [b0, b1]
# the value for which we want to predict the value
size = 30189
features = [size]
print(f"The predicted price for a house of size {features[0]}sqft is ${predict(theta, features)}")
## The predicted price for a house of size 30189sqft is $241515

However, we can certainly do better. While this function correctly predicts the price of houses based on their size, it is for-loop-based and so would not probably scale up very well as we increase the number of features or if we have to deal with a bigger dataset.

2.2 Using linear algebra

Instead of using a function, we could simply make use of linear algebra and matrices multiplication to calculate the predicted price. This will allow us to predict the house prices for different sets of \(\theta\) and different training examples while avoiding a slow for-loops constructs.

Remember that for two matrices \(A\) and \(B\) of respective shapes (M x N) and (N x K) (note the similar dimension N), their multiplication (dot product) \(A\cdot B\) will produce a matrix \(C\) with dimension (M x K).

In our situation, we can use the dot product of our features vector by our theta vector (we need to make sure the sizes are in agreement) to predict the price of our houses. Using the numpy library we have several options:

  • np.dot if both vectors are numpy arrays. In this case we need to reshape our theta vector to be a (2x1) vector
  • as of Python 3.5, numpy supports infix matrix multiplication between numpy arrays using the @ operator, this would be the same than using numpy.matmul.
  • convert both our vectors as matrices and use matrix multiplication using the * operator.

Note (1): np.matmul differs from np.dot in two important ways:

  • Multiplication by scalars is not allowed.
  • Stacks of matrices are broadcasted together as if the matrices were elements.

For np.matmul: If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcasted accordingly.

For np.dot: For 2D arrays it is equivalent to matrix multiplication, and for 1D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of A and the second-to-last of B.

Note (2): We also have a choice to which data container to use to store our parameters (\(\theta\)) and features (\(x_n\)) vectors, arrays or matrices.

Let’s see how do all the above options look in practice:

  • Working with numpy arrays
# Shaping our arrays (those two constructs are similar):
theta = np.array([b0, b1]).reshape((2, 1))
theta = np.array([b0, b1])[:, np.newaxis]
features = np.array([1, size]) # note that we added x_0
print(f"Dot product: predicted price ${np.dot(features, theta)}")
## Dot product: predicted price $[241515]
print(f"np.matmul: predicted price ${np.matmul(features, theta)}")
## np.matmul: predicted price $[241515]
print(f"'@' operator: predicted price ${features@theta}")
## '@' operator: predicted price $[241515]
  • Working with numpy matrices
# Again, several syntaxic choices possible
theta = np.matrix([[b0], [b1]])
theta = np.matrix([b0, b1]).transpose()
features = np.matrix([1, size])
print(f"Matrix multiplication: predicted price ${features*theta}")
## Matrix multiplication: predicted price $[[241515]]

An advantage of using multidimensional numpy arrays over numpy matrices is that we’ll be able to perform both dot product and element-wise products without calling special functions. So all the code will make use of arrays from now on to make matrices.

3 Univariate linear regression for prediction

In the case of univariate linear regression, the model is of the form: \[y=\theta_0 +\theta_1\times x_1\]

In our case, \(x\) is the size of the house we want to predict the price from.

3.1 Predicting the price of different houses for different \(\theta\) parameters

We have some linear algebra working in Python. Now we can efficiently compute the predictions for any number of different sets of \(\theta\) and different house sizes:

# Let's get different houses sizes for which we want to predict the price 
houseSizes = np.array([234104, 141016, 129534, 98852])
# From them, let's make a proper (4x2) matrix, adding the x_0=1 to each example
features = features = np.vstack([np.ones(len(houseSizes)), houseSizes]).transpose()
# 3 different [intercept, slope] pairs, (2x3) matrix
theta = np.array([[-40, 0.25], [200, 0.1], [-150, 0.4]]).transpose()
print(features)
## [[1.00000e+00 2.34104e+05]
##  [1.00000e+00 1.41016e+05]
##  [1.00000e+00 1.29534e+05]
##  [1.00000e+00 9.88520e+04]]
print(theta)
## [[-4.0e+01  2.0e+02 -1.5e+02]
##  [ 2.5e-01  1.0e-01  4.0e-01]]
# predictions for each house (rows), for each theta parameter pairs (columns)
print(np.dot(features, theta))
## [[58486.  23610.4 93491.6]
##  [35214.  14301.6 56256.4]
##  [32343.5 13153.4 51663.6]
##  [24673.  10085.2 39390.8]]

3.2 Cost function

Alright, we now know how to use our model compute the predictions when the parameters (\(\theta\) vector) are known. But how to compute the value of the \(\theta\)? If we have a dataset of house sizes and their corresponding prices, we might be able to fit our linear model to this dataset to get the \(\theta\) parameters values (we are going to train our model).

In order to do this, we need to define a cost function of the form: \(J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}(h_0(x^{(i)})-y^{(i)})^2\)

with \(h_0(x)=\hat{y}\) being our predictive function (the one we worked with above): \(h_0(x) = \sum_{j=0}^{n} \theta_i\times x_j\).

  • \(m\) is the number of training data we have
  • \(n\) is the polynomial degree for our regression (in our case, we are using a simple line so far, so \(n=1\))
  • \(y\) is the true value that we are trying to predict with our \(h_0(x)\) function
# For correctly shaped theta, features and trueValues matrices, we can define the cost function as
def computeCost(theta, features, trueValues):
    predictions = np.dot(features, theta) # this is a (number of features x number of Theta vectors) matrix
    error = (predictions-trueValues)**2 # if you're inputing a matrice, this won't perform an element-wise square, use np.power instead
    return np.sum(error, axis=0)/(2*features.shape[0])
# example  
theta = np.array([2, 9])[:, np.newaxis]
features = np.array([[1, 397263], [1, 567108]])
trueValues = np.array([457145, 63083])[:, np.newaxis]
print(computeCost(theta, features, trueValues))
## [8.78347575e+12]

3.3 Applying linear regression on real data

3.3.1 The dataset

Alright, we have the functions ready, let’s get some data and build a linear model to predict the price of houses based on their size.

Note: This dataset has been scrapped from Zillow, and contains information about the houses that were for sale in Santa Monica on 05/12/2017

import pandas as pd
dataHouses = pd.read_csv("../data/2017-05-12_141127.csv")
Santa Monica - House dataset (Zillow - 05/12/2017)
address city state zip price sqft bedrooms bathrooms days_on_zillow sale_type url
1 1021 12th St APT 107 Santa Monica CA 90403 698000 804 1 1.0 1 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1021-12th-St-APT-107-Santa-Monica-CA-90403/20478727_zpid/
2 913 18th St APT 4 Santa Monica CA 90403 1099000 1425 2 3.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/913-18th-St-APT-4-Santa-Monica-CA-90403/20475136_zpid/
3 270 Palisades Beach Rd UNIT 301 Santa Monica CA 90402 2999000 1590 2 2.0 NaN Coming Soon http://www.zillow.com/homes/for_sale//homedetails/270-Palisades-Beach-Rd-UNIT-301-Santa-Monica-CA-90402/20486912_zpid/
4 808 5th St APT 7 Santa Monica CA 90403 680000 980 2 2.0 2 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/808-5th-St-APT-7-Santa-Monica-CA-90403/2094307910_zpid/
5 1919 4th St # A Santa Monica CA 90405 2200000 2111 3 3.0 1 Townhouse For Sale http://www.zillow.com/homes/for_sale//homedetails/1919-4th-St-A-Santa-Monica-CA-90405/2097316419_zpid/
6 1138 12th St APT 5 Santa Monica CA 90403 849000 1045 2 2.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1138-12th-St-APT-5-Santa-Monica-CA-90403/20478785_zpid/
7 1319 11th St APT 6 Santa Monica CA 90401 549000 558 1 1.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1319-11th-St-APT-6-Santa-Monica-CA-90401/20479668_zpid/
8 444 21st St Santa Monica CA 90402 7895000 7311 6 8.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/444-21st-St-Santa-Monica-CA-90402/20476513_zpid/
9 1035 25th St Santa Monica CA 90403 2329000 1501 3 2.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/1035-25th-St-Santa-Monica-CA-90403/20475429_zpid/
10 1307 Berkeley St Santa Monica CA 90404 2695000 2603 5 4.0 1 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1307-Berkeley-St-Santa-Monica-CA-90404/20470335_zpid/
11 933 20th St UNIT B Santa Monica CA 90403 1049000 1227 3 2.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/933-20th-St-UNIT-B-Santa-Monica-CA-90403/20475248_zpid/
12 2nd St APT 201 Santa Monica CA 90403 NaN 971 2 2.0 10 Pre-Foreclosure http://www.zillow.com/homes/for_sale/129611562_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
13 722 Marine St Santa Monica CA 90405 2795000 2463 4 3.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/722-Marine-St-Santa-Monica-CA-90405/20483309_zpid/
14 933 11th St APT 15 Santa Monica CA 90403 819000 1018 2 2.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/933-11th-St-APT-15-Santa-Monica-CA-90403/20478857_zpid/
15 3110 Broadway Santa Monica CA 90404 489000 NaN 1 1.0 7 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/3110-Broadway-Santa-Monica-CA-90404/82816526_zpid/
16 1433 Euclid St Santa Monica CA 90404 9300000 14698 26 24.0 3 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1433-Euclid-St-Santa-Monica-CA-90404/2097803579_zpid/
17 1837 Euclid St Santa Monica CA 90404 2225000 3901 0 NaN 3 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1837-Euclid-St-Santa-Monica-CA-90404/2094325481_zpid/
18 1130 Georgina Ave Santa Monica CA 90402 6495000 2325 3 2.0 7 House For Sale http://www.zillow.com/homes/for_sale//homedetails/1130-Georgina-Ave-Santa-Monica-CA-90402/20477549_zpid/
19 1240 24th St UNIT 3 Santa Monica CA 90404 1299000 1803 3 3.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1240-24th-St-UNIT-3-Santa-Monica-CA-90404/20474535_zpid/
20 1113 24th St Santa Monica CA 90403 1995000 NaN 4 3.0 6 House For Sale http://www.zillow.com/homes/for_sale//homedetails/1113-24th-St-Santa-Monica-CA-90403/2098029479_zpid/
21 1225 Washington Ave APT 3 Santa Monica CA 90403 639000 589 1 1.0 7 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1225-Washington-Ave-APT-3-Santa-Monica-CA-90403/20478675_zpid/
22 2311 4th St APT 310 Santa Monica CA 90405 925000 979 2 2.0 8 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/2311-4th-St-APT-310-Santa-Monica-CA-90405/20484219_zpid/
23 628 20th St Santa Monica CA 90402 3695000 3063 5 2.5 8 House For Sale http://www.zillow.com/homes/for_sale//homedetails/628-20th-St-Santa-Monica-CA-90402/20476600_zpid/
24 415 23rd St Santa Monica CA 90402 3795000 3100 3 3.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/415-23rd-St-Santa-Monica-CA-90402/20476050_zpid/
25 900 Euclid St APT 108 Santa Monica CA 90403 1200000 1384 2 2.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/900-Euclid-St-APT-108-Santa-Monica-CA-90403/20478588_zpid/
26 2602 25th St Santa Monica CA 90405 1969000 1838 4 2.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/2602-25th-St-Santa-Monica-CA-90405/20473096_zpid/
27 1756 Franklin St Santa Monica CA 90404 1725000 3180 5 5.0 7 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1756-Franklin-St-Santa-Monica-CA-90404/2094358620_zpid/
28 621 Marguerita Ave Santa Monica CA 90402 12495000 6500 6 9.0 10 House For Sale http://www.zillow.com/homes/for_sale//homedetails/621-Marguerita-Ave-Santa-Monica-CA-90402/20486225_zpid/
29 1610 Broadway Santa Monica CA 90404 9125000 11211 15 16.0 22 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1610-Broadway-Santa-Monica-CA-90404/2106454742_zpid/
30 2215 Dewey St Santa Monica CA 90405 3850000 4500 5 5.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/2215-Dewey-St-Santa-Monica-CA-90405/2111948923_zpid/
31 1238 12th St UNIT 4 Santa Monica CA 90401 899000 1265 2 3.0 15 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1238-12th-St-UNIT-4-Santa-Monica-CA-90401/20479485_zpid/
32 438 11th St Santa Monica CA 90402 3269000 2000 3 3.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/438-11th-St-Santa-Monica-CA-90402/20477675_zpid/
33 817 12th St Santa Monica CA 90403 6400000 10639 0 NaN 24 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/817-12th-St-Santa-Monica-CA-90403/2094590790_zpid/
34 333 17th St Santa Monica CA 90402 6500000 4900 5 7.0 NaN Coming Soon http://www.zillow.com/homes/for_sale//homedetails/333-17th-St-Santa-Monica-CA-90402/20476895_zpid/
35 1440 23rd St APT 116 Santa Monica CA 90404 539000 637 1 1.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1440-23rd-St-APT-116-Santa-Monica-CA-90404/20473990_zpid/
36 26 Arcadia Ter Santa Monica CA 90401 5757750 4265 6 6.0 28 House For Sale http://www.zillow.com/homes/for_sale//homedetails/26-Arcadia-Ter-Santa-Monica-CA-90401/20484705_zpid/
37 820 4th St APT 1 Santa Monica CA 90403 4250000 9282 16 14.0 29 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/820-4th-St-APT-1-Santa-Monica-CA-90403/2094627028_zpid/
38 2427 Centinela Ave APT H Santa Monica CA 90405 468000 630 1 1.0 13 Auction http://www.zillow.com/homes/for_sale//homedetails/2427-Centinela-Ave-APT-H-Santa-Monica-CA-90405/20471714_zpid/
39 Centinela Ave APT H Santa Monica CA 90405 NaN 630 1 1.0 23 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/128644073_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
40 1840 17th St Santa Monica CA 90404 1899000 1000 3 1.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/1840-17th-St-Santa-Monica-CA-90404/20480125_zpid/
41 721 Cedar St UNIT A Santa Monica CA 90405 779000 994 2 1.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/721-Cedar-St-UNIT-A-Santa-Monica-CA-90405/20482323_zpid/
42 920 Alta Ave Santa Monica CA 90402 6395000 6274 6 7.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/920-Alta-Ave-Santa-Monica-CA-90402/2096799825_zpid/
43 238 19th St Santa Monica CA 90402 6500000 4900 5 7.0 NaN Coming Soon http://www.zillow.com/homes/for_sale//homedetails/238-19th-St-Santa-Monica-CA-90402/20476698_zpid/
44 1811 34th St Santa Monica CA 90404 1175000 1213 3 1.0 11 House For Sale http://www.zillow.com/homes/for_sale//homedetails/1811-34th-St-Santa-Monica-CA-90404/20471260_zpid/
45 2911 4th St APT 117 Santa Monica CA 90405 1395000 1629 4 3.0 24 Townhouse For Sale http://www.zillow.com/homes/for_sale//homedetails/2911-4th-St-APT-117-Santa-Monica-CA-90405/20483085_zpid/
46 1705 Ocean Ave UNIT 314 Santa Monica CA 90401 2799000 1609 2 3.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1705-Ocean-Ave-UNIT-314-Santa-Monica-CA-90401/2097304381_zpid/
47 515 Ocean Ave UNIT 501S Santa Monica CA 90402 3250000 1865 2 3.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/515-Ocean-Ave-UNIT-501S-Santa-Monica-CA-90402/2094576749_zpid/
48 1755 Ocean Ave APT 505 Santa Monica CA 90401 1850000 1054 1 1.0 25 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1755-Ocean-Ave-APT-505-Santa-Monica-CA-90401/2103898721_zpid/
49 1963 17th St Santa Monica CA 90404 2250000 4408 0 NaN 29 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1963-17th-St-Santa-Monica-CA-90404/2094636550_zpid/
50 459 22nd St Santa Monica CA 90402 4995000 4251 4 5.0 21 House For Sale http://www.zillow.com/homes/for_sale//homedetails/459-22nd-St-Santa-Monica-CA-90402/20476165_zpid/
51 211 17th St Santa Monica CA 90402 3995000 3025 4 4.0 14 House For Sale http://www.zillow.com/homes/for_sale//homedetails/211-17th-St-Santa-Monica-CA-90402/20476916_zpid/
52 201 Ocean Ave UNIT 809B Santa Monica CA 90402 3300000 1562 3 3.0 10 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-809B-Santa-Monica-CA-90402/20487148_zpid/
53 212 Marine St UNIT 205 Santa Monica CA 90405 1997500 1660 2 2.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/212-Marine-St-UNIT-205-Santa-Monica-CA-90405/82750007_zpid/
54 838 16th St APT 9 Santa Monica CA 90403 1795000 1440 2 3.0 30 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/838-16th-St-APT-9-Santa-Monica-CA-90403/67420575_zpid/
55 402 Pacific St Santa Monica CA 90405 4190000 3780 6 6.0 23 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/402-Pacific-St-Santa-Monica-CA-90405/2102445408_zpid/
56 528 16th St Santa Monica CA 90402 4250000 3411 4 4.0 14 House For Sale http://www.zillow.com/homes/for_sale//homedetails/528-16th-St-Santa-Monica-CA-90402/20477097_zpid/
57 1927 Cloverfield Blvd Santa Monica CA 90404 3500000 4856 7 6.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1927-Cloverfield-Blvd-Santa-Monica-CA-90404/2094619149_zpid/
58 718 Euclid St Santa Monica CA 90402 3399000 3129 6 3.5 16 House For Sale http://www.zillow.com/homes/for_sale//homedetails/718-Euclid-St-Santa-Monica-CA-90402/92345675_zpid/
59 201 Ocean Ave UNIT 1610P Santa Monica CA 90402 NaN 1325 2 1.0 1275 Foreclosed http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-1610P-Santa-Monica-CA-90402/20487063_zpid/
60 808 # 2 5th St # 2 Santa Monica CA 90403 401900 629 1 1.0 62 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/808-2-5th-St-2-Santa-Monica-CA-90403/2095005508_zpid/
61 Ocean Ave UNIT 1409B Santa Monica CA 90402 NaN 1562 3 3.0 422 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/118990030_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
62 110 Larkin Pl Santa Monica CA 90402 11000000 3697 4 5.0 443 House For Sale http://www.zillow.com/homes/for_sale//homedetails/110-Larkin-Pl-Santa-Monica-CA-90402/20477359_zpid/
63 0 Montana Ave Santa Monica CA 90403 8000000 NaN NaN NaN 105 House For Sale http://www.zillow.com/homes/for_sale//homedetails/0-Montana-Ave-Santa-Monica-CA-90403/2095518614_zpid/
64 1255 Palisades Beach Rd Santa Monica CA 90401 5995000 3197 3 4.0 115 House For Sale http://www.zillow.com/homes/for_sale//homedetails/1255-Palisades-Beach-Rd-Santa-Monica-CA-90401/20484767_zpid/
65 2721 2nd St APT 215 Santa Monica CA 90405 450000 643 1 1.0 76 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/2721-2nd-St-APT-215-Santa-Monica-CA-90405/20482930_zpid/
66 1024 Palisades Beach Rd Santa Monica CA 90403 8995000 7267 5 8.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/1024-Palisades-Beach-Rd-Santa-Monica-CA-90403/20485840_zpid/
67 10th St Santa Monica CA 90402 3720000 5260 7 6.0 638 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/123583623_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
68 22nd St Santa Monica CA 90402 4790000 4374 5 6.0 499 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/125417066_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
69 1231 18th St APT 1 Santa Monica CA 90404 1200000 2347 4 3.0 1226 Foreclosed http://www.zillow.com/homes/for_sale//homedetails/1231-18th-St-APT-1-Santa-Monica-CA-90404/20474426_zpid/
70 209 Euclid St Santa Monica CA 90402 5480000 6859 5 7.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/209-Euclid-St-Santa-Monica-CA-90402/20477487_zpid/
71 614 Washington Ave Santa Monica CA 90403 1899900 1905 3 2.0 32 House For Sale http://www.zillow.com/homes/for_sale//homedetails/614-Washington-Ave-Santa-Monica-CA-90403/20484858_zpid/
72 323 21st Pl Santa Monica CA 90402 7495000 8180 5 8.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/323-21st-Pl-Santa-Monica-CA-90402/20476319_zpid/
73 21st Pl Santa Monica CA 90402 164000 8180 5 8.0 235 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/126588922_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
74 1147 24th St APT A Santa Monica CA 90403 553000 444 1 1.0 5091 Foreclosed http://www.zillow.com/homes/for_sale//homedetails/1147-24th-St-APT-A-Santa-Monica-CA-90403/20474189_zpid/
75 4th St APT 3 Santa Monica CA 90405 NaN 1007 2 2.0 77 Pre-Foreclosure http://www.zillow.com/homes/for_sale/128838484_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
76 2520 5th St Santa Monica CA 90405 2460000 2520 4 4.0 1674 Foreclosed http://www.zillow.com/homes/for_sale//homedetails/2520-5th-St-Santa-Monica-CA-90405/20482416_zpid/
77 835 San Vicente Blvd Santa Monica CA 90402 5750000 4828 5 7.0 133 House For Sale http://www.zillow.com/homes/for_sale//homedetails/835-San-Vicente-Blvd-Santa-Monica-CA-90402/20477377_zpid/
78 San Vicente Blvd Santa Monica CA 90402 5160000 4828 3 5.0 21 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/116774388_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
79 Ocean Park Blvd Santa Monica CA 90405 1730000 2010 3 3.0 29 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/116777205_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
80 Delaware Ave Santa Monica CA 90404 NaN 1276 3 2.0 45 Pre-Foreclosure http://www.zillow.com/homes/for_sale/128915182_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
81 Chelsea Ave Santa Monica CA 90403 NaN 1608 2 2.0 57 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/128591935_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
82 Ocean Park Blvd Santa Monica CA 90405 NaN 951 1 2.0 98 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/127102029_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
83 1139 23rd St Santa Monica CA 90403 2950000 2689 6 6.0 41 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1139-23rd-St-Santa-Monica-CA-90403/2123600159_zpid/
84 2441 34th St Santa Monica CA 90405 4495000 11856 16 12.0 59 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/2441-34th-St-Santa-Monica-CA-90405/2098889594_zpid/
85 Hill St Santa Monica CA 90405 1720000 2016 4 2.0 382 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/122985485_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
86 23rd St UNIT 8 Santa Monica CA 90403 NaN 2046 3 3.0 172 Pre-Foreclosure http://www.zillow.com/homes/for_sale/127248642_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
87 Ashland Ave Santa Monica CA 90405 NaN 2032 4 2.0 178 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/126744711_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
88 5th St APT 208 Santa Monica CA 90403 NaN 1315 2 2.0 115 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/127013074_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
89 1918 11th St APT E Santa Monica CA 90404 799999 979 3 2.0 33 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1918-11th-St-APT-E-Santa-Monica-CA-90404/2094703249_zpid/
90 2316 3rd St Santa Monica CA 90405 2599000 2246 3 4.0 181 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/2316-3rd-St-Santa-Monica-CA-90405/20484532_zpid/
91 Pier Ave APT D Santa Monica CA 90405 761000 994 2 2.0 444 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/116777211_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
92 2454 Ocean Park Blvd Santa Monica CA 90405 1075000 1178 3 2.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/2454-Ocean-Park-Blvd-Santa-Monica-CA-90405/20472753_zpid/
93 110 Ocean Park Blvd APT 502 Santa Monica CA 90405 3650000 1800 2 2.0 30 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/110-Ocean-Park-Blvd-APT-502-Santa-Monica-CA-90405/20483913_zpid/
94 Yorkshire Ave Santa Monica CA 90404 NaN 1001 3 1.0 583 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/127225431_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
95 20th St Santa Monica CA 90404 1050000 1326 2 2.0 262 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/116774375_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
96 535 Ocean Ave UNIT 11C Santa Monica CA 90402 2650000 1585 2 2.0 35 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/535-Ocean-Ave-UNIT-11C-Santa-Monica-CA-90402/20486446_zpid/
97 Virginia Ave Santa Monica CA 90404 1310000 1905 3 2.0 185 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/117509820_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
98 1144 17th St APT 14 Santa Monica CA 90403 1099000 1510 3 3.0 43 Townhouse For Sale http://www.zillow.com/homes/for_sale//homedetails/1144-17th-St-APT-14-Santa-Monica-CA-90403/2144453942_zpid/
99 19 Seaview Ter Santa Monica CA 90401 4599000 2500 4 3.0 42 House For Sale http://www.zillow.com/homes/for_sale//homedetails/19-Seaview-Ter-Santa-Monica-CA-90401/20484728_zpid/
100 1824 20th St APT A Santa Monica CA 90404 465000 573 0 1.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1824-20th-St-APT-A-Santa-Monica-CA-90404/20473574_zpid/
101 1334 19th St APT 4 Santa Monica CA 90404 998000 1414 2 3.0 213 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1334-19th-St-APT-4-Santa-Monica-CA-90404/20474723_zpid/
102 723 Palisades Beach Rd UNIT 107 Santa Monica CA 90402 1575000 1072 2 2.0 157 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/723-Palisades-Beach-Rd-UNIT-107-Santa-Monica-CA-90402/20485821_zpid/
103 157 Hart Ave Santa Monica CA 90405 2450000 1878 3 3.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/157-Hart-Ave-Santa-Monica-CA-90405/20483535_zpid/
104 222 20th St Santa Monica CA 90402 7795000 7581 6 8.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/222-20th-St-Santa-Monica-CA-90402/20476678_zpid/
105 1053 18th St Santa Monica CA 90403 5050000 9650 15 15.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1053-18th-St-Santa-Monica-CA-90403/2099208641_zpid/
106 2922 Montana Ave APT B Santa Monica CA 90403 1549000 1800 2 3.0 NaN Townhouse For Sale http://www.zillow.com/homes/for_sale//homedetails/2922-Montana-Ave-APT-B-Santa-Monica-CA-90403/55325158_zpid/
107 1633 Sunset Ave Santa Monica CA 90405 2250000 1558 2 2.0 71 House For Sale http://www.zillow.com/homes/for_sale//homedetails/1633-Sunset-Ave-Santa-Monica-CA-90405/20481398_zpid/
108 915 12th St APT 5 Santa Monica CA 90403 1350000 1512 2 3.0 32 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/915-12th-St-APT-5-Santa-Monica-CA-90403/63789044_zpid/
109 41 Seaview Ter Santa Monica CA 90401 2800000 NaN 0 NaN NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/41-Seaview-Ter-Santa-Monica-CA-90401/2094788258_zpid/
110 1720 Franklin St Santa Monica CA 90404 1200000 675 1 1.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/1720-Franklin-St-Santa-Monica-CA-90404/20471093_zpid/
111 101 California Ave UNIT 603 Santa Monica CA 90403 1200000 674 1 1.0 35 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/101-California-Ave-UNIT-603-Santa-Monica-CA-90403/20485721_zpid/
112 1755 Ocean Ave APT 704 Santa Monica CA 90401 4800000 1737 2 3.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1755-Ocean-Ave-APT-704-Santa-Monica-CA-90401/2094777858_zpid/
113 835 Ashland Ave Santa Monica CA 90405 4300000 8986 15 17.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/835-Ashland-Ave-Santa-Monica-CA-90405/2101865202_zpid/
114 463 18th St Santa Monica CA 90402 5300000 5759 5 5.0 196 House For Sale http://www.zillow.com/homes/for_sale//homedetails/463-18th-St-Santa-Monica-CA-90402/20476760_zpid/
115 714 Bay St Santa Monica CA 90405 2100000 3240 8 4.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/714-Bay-St-Santa-Monica-CA-90405/2102328794_zpid/
116 213 Euclid St Santa Monica CA 90402 6399000 6622 5 6.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/213-Euclid-St-Santa-Monica-CA-90402/20477486_zpid/
117 1123 11th St APT 4 Santa Monica CA 90403 1295000 1674 3 3.0 231 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1123-11th-St-APT-4-Santa-Monica-CA-90403/20478773_zpid/
118 2521 Euclid St Santa Monica CA 90405 3500000 4071 0 NaN 172 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/2521-Euclid-St-Santa-Monica-CA-90405/2096272609_zpid/
119 1024 Pico Blvd Santa Monica CA 90405 3495000 4874 0 1.0 94 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1024-Pico-Blvd-Santa-Monica-CA-90405/2095394655_zpid/
120 833 Ocean Ave APT 303 Santa Monica CA 90403 2150000 1430 2 2.0 35 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/833-Ocean-Ave-APT-303-Santa-Monica-CA-90403/20485599_zpid/
121 2608 3rd St Santa Monica CA 90405 2775000 2852 4 3.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/2608-3rd-St-Santa-Monica-CA-90405/65242370_zpid/
122 633 24th St Santa Monica CA 90402 5300000 5280 5 9.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/633-24th-St-Santa-Monica-CA-90402/20475973_zpid/
123 1815 20th St Santa Monica CA 90404 2200000 1978 5 4.0 154 House For Sale http://www.zillow.com/homes/for_sale//homedetails/1815-20th-St-Santa-Monica-CA-90404/20473544_zpid/
124 1533 11th St Santa Monica CA 90401 2750000 7509 NaN NaN NaN Lot/Land For Sale http://www.zillow.com/homes/for_sale//homedetails/1533-11th-St-Santa-Monica-CA-90401/20479813_zpid/
125 422 Lincoln Blvd SANTA MONICA CA 90402 3000000 7483 NaN NaN NaN Lot/Land For Sale http://www.zillow.com/homes/for_sale//homedetails/422-Lincoln-Blvd-Santa-Monica-CA-90402/95577103_zpid/
126 Lincoln Blvd SANTA MONICA CA 90402 NaN NaN NaN NaN 45 Pre-Foreclosure (Auction) http://www.zillow.com/homes/for_sale/128608384_zpid/any_days/globalrelevanceex_sort/29.759534,-95.335321,29.675003,-95.502863_rect/12_zm/
127 1422 19th St APT A Santa Monica CA 90404 2190000 1715 2 3.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1422-19th-St-APT-A-Santa-Monica-CA-90404/20474075_zpid/
128 1807 17th St Santa Monica CA 90404 1785000 7749 NaN NaN 97 Lot/Land For Sale http://www.zillow.com/homes/for_sale//homedetails/1807-17th-St-Santa-Monica-CA-90404/20473591_zpid/
129 834 Pearl St Santa Monica CA 90405 1895000 1965 3 4.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/834-Pearl-St-Santa-Monica-CA-90405/2095074075_zpid/
130 1247 25th St Santa Monica CA 90404 2300000 1875 3 10.0 259 House For Sale http://www.zillow.com/homes/for_sale//homedetails/1247-25th-St-Santa-Monica-CA-90404/20474589_zpid/
131 1626 Berkeley St Santa Monica CA 90404 1399000 730 2 1.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/1626-Berkeley-St-Santa-Monica-CA-90404/20470978_zpid/
132 1824 17th St Santa Monica CA 90404 2625000 4777 11 6.0 NaN Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1824-17th-St-Santa-Monica-CA-90404/2103173191_zpid/
133 1622 Berkeley St Santa Monica CA 90404 1399000 1898 2 1.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/1622-Berkeley-St-Santa-Monica-CA-90404/20470977_zpid/
134 419 Hill St Santa Monica CA 90405 2650000 1920 4 4.0 46 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/419-Hill-St-Santa-Monica-CA-90405/2096762504_zpid/
135 201 Ocean Ave UNIT 509B Santa Monica CA 90402 2350000 1562 3 3.0 73 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-509B-Santa-Monica-CA-90402/20487119_zpid/
136 1116 24th St Santa Monica CA 90403 4695000 7636 0 NaN 120 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1116-24th-St-Santa-Monica-CA-90403/20474194_zpid/
137 201 Ocean Ave UNIT 1904P Santa Monica CA 90402 2950000 1554 3 3.0 112 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-1904P-Santa-Monica-CA-90402/20487087_zpid/
138 201 Ocean Ave UNIT 1703P Santa Monica CA 90402 1895000 906 1 1.0 111 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-1703P-Santa-Monica-CA-90402/20487066_zpid/
139 201 Ocean Ave UNIT 903P Santa Monica CA 90402 1750000 906 1 1.0 108 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-903P-Santa-Monica-CA-90402/20486996_zpid/
140 201 Ocean Ave UNIT 1504B Santa Monica CA 90402 3400000 1554 3 3.0 111 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-1504B-Santa-Monica-CA-90402/20487203_zpid/
141 1114 23rd St APT 4 Santa Monica CA 90403 1059000 1604 3 2.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/1114-23rd-St-APT-4-Santa-Monica-CA-90403/2095017507_zpid/
142 201 Ocean Ave UNIT 1209B Santa Monica CA 90402 3800000 1562 3 3.0 111 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-1209B-Santa-Monica-CA-90402/20487188_zpid/
143 201 Ocean Ave UNIT 1105B Santa Monica CA 90402 1395000 868 1 1.0 175 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-1105B-Santa-Monica-CA-90402/20487174_zpid/
144 201 Ocean Ave UNIT 409P Santa Monica CA 90402 4595000 1566 3 3.0 37 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-409P-Santa-Monica-CA-90402/20486952_zpid/
145 201 Ocean Ave UNIT 1903P Santa Monica CA 90402 1850000 906 1 1.0 112 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-1903P-Santa-Monica-CA-90402/20487086_zpid/
146 270 Palisades Beach Rd UNIT 203 Santa Monica CA 90402 5375000 2210 3 4.0 NaN Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/270-Palisades-Beach-Rd-UNIT-203-Santa-Monica-CA-90402/20486911_zpid/
147 201 Ocean Ave UNIT 904P Santa Monica CA 90402 2600000 1554 3 3.0 36 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-904P-Santa-Monica-CA-90402/20486997_zpid/
148 1328 9th St APT 6 Santa Monica CA 90401 1250000 1275 3 3.0 253 Make Me Move® http://www.zillow.com/homes/for_sale//homedetails/1328-9th-St-APT-6-Santa-Monica-CA-90401/20479630_zpid/
149 20 Ocean Park Blvd APT 22 Santa Monica CA 90405 2595000 2663 2 3.0 126 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/20-Ocean-Park-Blvd-APT-22-Santa-Monica-CA-90405/20483828_zpid/
150 1417 10th St Santa Monica CA 90401 2999000 4541 8 4.0 46 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1417-10th-St-Santa-Monica-CA-90401/20479771_zpid/
151 2817 3rd St Santa Monica CA 90405 3000000 2600 5 5.0 62 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/2817-3rd-St-Santa-Monica-CA-90405/20482878_zpid/
152 201 Ocean Ave UNIT 403P Santa Monica CA 90402 1650000 906 1 1.0 205 Condo For Sale http://www.zillow.com/homes/for_sale//homedetails/201-Ocean-Ave-UNIT-403P-Santa-Monica-CA-90402/20486946_zpid/
153 2903 Delaware Ave Santa Monica CA 90404 1599000 1666 3 3.0 70 House For Sale http://www.zillow.com/homes/for_sale//homedetails/2903-Delaware-Ave-Santa-Monica-CA-90404/20471195_zpid/
154 905 Berkeley St Santa Monica CA 90403 3750000 3495 4 5.0 32 House For Sale http://www.zillow.com/homes/for_sale//homedetails/905-Berkeley-St-Santa-Monica-CA-90403/20469353_zpid/
155 1159 Centinela Ave Santa Monica CA 90403 3499000 2809 4 3.0 NaN House For Sale http://www.zillow.com/homes/for_sale//homedetails/1159-Centinela-Ave-Santa-Monica-CA-90403/20467856_zpid/
156 2310 Ocean Park Blvd Santa Monica CA 90405 1995000 3260 5 5.0 169 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/2310-Ocean-Park-Blvd-Santa-Monica-CA-90405/2097905572_zpid/
157 1824 10th St Santa Monica CA 90404 2725000 4000 6 5.0 172 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/1824-10th-St-Santa-Monica-CA-90404/20479961_zpid/
158 2910 Highland Ave Santa Monica CA 90405 2195000 NaN 0 NaN 106 Apartment For Sale http://www.zillow.com/homes/for_sale//homedetails/2910-Highland-Ave-Santa-Monica-CA-90405/20483058_zpid/
159 2602 3rd St Santa Monica CA 90405 4495000 4103 4 3.0 209 House For Sale http://www.zillow.com/homes/for_sale//homedetails/2602-3rd-St-Santa-Monica-CA-90405/20482538_zpid/
160 1144 Chelsea Ave APT A Santa Monica CA 90403 2245000 2748 3 4.0 NaN Townhouse For Sale http://www.zillow.com/homes/for_sale//homedetails/1144-Chelsea-Ave-APT-A-Santa-Monica-CA-90403/20474178_zpid/

```

Let’s visualize the relationship between size and price of this dataset.

# filter out houses that don't have at least Price and Size
data = dataHouses[(pd.isna(dataHouses.price)==False) & (pd.isna(dataHouses.sqft)==False)]
print(f"Size of the dataset: {len(dataHouses)} (original), {len(data)} (filtered for NAN on $|sqft)")
## Size of the dataset: 160 (original), 142 (filtered for NAN on $|sqft)
import altair as alt
houses = alt.Chart(data).mark_circle().encode(
    alt.Color('bathrooms:Q',
        scale=alt.Scale(scheme='viridis')),
    x='sqft:Q',
    y='price:Q',
    size='bedrooms',
    tooltip=['price', 'sqft', 'bedrooms', 'bathrooms']
    ).configure_mark(opacity=0.7).interactive()

3.3.2 Visualization of the cost on the (\(\theta_0\), \(\theta_1\)) space

We’re dealing with a very simple linear model here, we only have one feature (the size of the house) to predict the price. So our model is basically a line defined by an intercept (\(\theta_0\)) and a slope (\(\theta_1\)).

Let’s see what the cost \(J(\theta)\) that we can compute from the data in this dataset looks like over a space of \((\theta_0, \theta_1)\) values.

Numpy provides np.meshgrid as a convenience to create a grid of parameter over which to compute functions. By using it, we will create grids of \(\theta_0\) and \(\theta_1\) parameters, so we need to modify the cost function so it can do the right computation using grids instead of vectors. We will work with vectors.

Feature scaling/normalization

To improve the algorithm performance, we will scale our features by substracting the mean and dividing by the standard deviation.

First, let’s create a meshgrid for our parameters using numpy:

intercept = np.linspace(-10000, 5000000, 500) # range of values for the intercept we want to cover
slope = np.linspace(-10000, 5000000, 500) # range of values for the slope we want to cover
# let's create all the different (intercept, slope) pairs from those ranges of values
# we can use np.meshgrid for that and then ravel the vectors and concatenate them into a proper theta matrix.
# The THETA matrix will be a (2 x number of pairs) matrix
THETA0, THETA1 = np.meshgrid(intercept, slope)
# Feature normalization
mu = data.sqft.mean() # mean
sigma = data.sqft.std() # std
SIZE = ((data.sqft - mu)/sigma).values
TRUEVALUES = data.price.values # no normalization of target
# reshaping features and trueValues vectors
features = SIZE[:, np.newaxis]
trueValues = TRUEVALUES[:, np.newaxis]

This is what the matrices looks like right now:

## THETA0 shape: (500, 500)
## THETA1 shape: (500, 500)
## features shape: (142, 1)
## trueValues shape: (142, 1)

Now we need to define some functions that can handle broadcasting

# define new prediction function to work with vectors, return prediction from a line equation
def computePrediction(theta0, theta1, features):
    return theta0 + theta1*features
# cost function that can handle meshgrids
# uses broadcasting abilities of numpy.
# CANNOT BE GENERALIZED TO MORE DIMENSIONS WITH THE CURRENT IMPLEMENTATION
def costFuncGrid(theta, features, trueValues):
    theta0, theta1 = theta
    # add a dimension to theta0 and theta1 over which we will sum everything later
    theta0 = theta0[:, np.newaxis]
    theta1 = theta1[:, np.newaxis]
    features = features*np.ones_like(theta1)
    prediction = computePrediction(theta0, theta1, features)
    return np.sum((prediction-trueValues)**2, axis=1)/(2*features.shape[1])
costValueGrid = costFuncGrid([THETA0, THETA1], features, trueValues)

And now for the visualization

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(12, 4))
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(122, projection='3d')
fig.suptitle("Cost function")
# 2D viz
ax1.contour(THETA0, THETA1, costValueGrid, 80)
ax1.set_xlabel(r"$\theta_0$ (intercept)")
ax1.set_ylabel(r"$\theta_1$ (slope)")
# 3D viz
ax2.plot_surface(THETA0, THETA1, costValueGrid, cmap="viridis")
ax2.set_xlabel(r"$\theta_0$ (intercept)")
ax2.set_ylabel(r"$\theta_1$ (slope)");
plt.tight_layout()
plt.show()

3.3.3 Gradient Descent algorithm for simple linear regression

Alright, now that we have set up our cost function space and we’ve seen it’s a convex function, let’s use gradient descent to find the minimum.

The Gradient Descent algorithm will iterate over the cost function space by checking each time which direction on this plane is the steepest down, and will update \(\theta_0\) and \(\theta_1\) accordingly to the \(\alpha\) rate for which you wanted your algorithm to learn. Note that if \(\alpha\) is too low, it will take a long time before reaching convergence, and if it is too high, then you’ll go away form the minimum as you’ll bounce on the “walls” of the convex plane (we’ll visualize that too).

For each \(j=(0,1): \theta_j\), repeat until convergence:

\(\theta_j := \theta_j-\alpha\frac{\delta}{\delta\theta_j}J(\theta_j)\)

with \(J(\theta_0, \theta_1)\) being the same cost function \(J(\theta_j)=\frac{1}{2m}\sum_{i=1}^{m}(h_0(x^{(i)})-y^{(i)})^2\) that we used previously,

and \(j=(0,1): \frac{\delta}{\delta\theta_j}J(\theta_j)=\frac{1}{m}\sum_{i=1}^{i=m}(h_0(x^{(i)})-y^{(i)})\times x_j^{(i)}\) being the derivative of this cost function (remember that \(x_0=1\)).

This is the derivative that we need to compute in the gradient descent algorithm.

We’ll make this function a generator so we can keep and analyze all the steps the algorithm went over:

def gradientDescent(startingTheta, features, trueValues, alpha=0.1, maxIterations=1000):    
    # number of data in training set
    m = np.array(features).shape[0]
    
    # We'll compare the theta vector to the next iteration to check convergence
    newTheta = startingTheta
    theta = np.ones_like(newTheta)
    
    i = 0 # we are setting up a counter to limit max number of iterations if algorithm doesn't converge
    while (newTheta!=theta).all():
        theta = newTheta
        prediction = np.dot(features, theta)
        # derivative for each theta
        derivative = (alpha/m)*np.sum((prediction - trueValues)*features, axis=0)[:, np.newaxis]
        newTheta = newTheta - derivative
        # in case there is no convergence
        i+=1
        if i>=maxIterations:
            break
        yield(np.ravel(newTheta))

Let’s run gradient descent on our dataset! And let’s experiment with different alpha rates.

# Initial [intercept,slope] values we'll start iterating from
initialTheta = np.array([1000000, 2500000])[:, np.newaxis]
# reshaping features and trueValues vectors
features = np.vstack([np.ones(len(SIZE)), (SIZE-SIZE.mean())/SIZE.std()]).transpose()
trueValues = TRUEVALUES[:, np.newaxis]
# we'll run gradient descent with different alpha rates
alphaList = [0.1, 0.3, 0.5, 1.5, 2.1]
gdList = [gradientDescent(initialTheta, features, trueValues, alpha=i) for i in alphaList]
descentList = [np.vstack([initialTheta.transpose(), np.vstack(list(gd))]) for gd in gdList]

Great, everything ran smoothly, and we stored the steps for different \(\alpha\) learning rate.

Let’s visualize how it looks on the cost function space that we computed previously.

fig,axes = plt.subplots(ncols=len(alphaList), figsize=(18, 4))
for i in range(len(alphaList)):
    axes[i].contour(THETA0, THETA1, costValueGrid, 50)
    axes[i].plot(descentList[i][:, 0], descentList[i][:, 1], "-x", color="red")
    axes[i].set_title(r"Learning rate $\alpha$" f"={alphaList[i]}, iter={len(descentList[i])}")
    
# no convergence for alpha=2.1, let's rescale our graph
axes[4].set_xlim(intercept.min(), intercept.max())
axes[4].set_ylim(slope.min(), slope.max())
plt.tight_layout()
plt.show()

We see that for \(\alpha=\), the learning rate is too high, and we bounce back and forth on the wall of the plane, getting farther and farther from the bottom of the bowl.

Let’s see that in 3D, for fun:

fig,axes = plt.subplots(ncols=len(alphaList)-2, figsize=(18, 4), subplot_kw={"projection": '3d'})
for i in range(len(alphaList)-2):
    theta = descentList[i+2].transpose()
    axes[i].set_title(r"Learning rate $\alpha$" f"={alphaList[i+2]}")
    axes[i].plot_surface(THETA0, THETA1, costValueGrid, cmap="viridis")
    axes[i].plot(np.ravel(descentList[i+2][:, 0]), np.ravel(descentList[i+2][:, 1]), np.ravel(computeCost(theta, features, trueValues)), "-o", color="red", alpha=0.5)
    axes[i].set_xlim(intercept.min(), intercept.max())
    axes[i].set_xlim(slope.min(), slope.max())
    axes[i].set_zlim(0, costValueGrid.max())
    axes[i].set_xlabel(r"$\theta_0$ (intercept)")
    axes[i].set_ylabel(r"$\theta_1$ (slope)");
plt.tight_layout()
plt.show()

What is the cost of our function? Before computing the cost, we need to unscale the coefficients that we have obtained using gradient descent, as they were computed on scaled features.

Now, we can compute the cost of our predictions at every steps of gradient descent and for all the \(\alpha\) learning rates.

costValues = [np.ravel(computeCost(theta.transpose(), features, trueValues)) for theta in descentList]
costDataDf = pd.DataFrame(index=range(np.max([len(c) for c in costValues])), columns=[f"alpha {alpha}" for alpha in alphaList])
for i,alpha in enumerate(alphaList):
    theta = descentList[i].transpose()
    costDataDf.loc[:len(costValues[i])-1, f"alpha {alpha}"] = np.ravel(computeCost(theta, features, trueValues))
    
#convert to long form data
costDataDf["iteration"] = costDataDf.index
costDataDf_Long = pd.melt(costDataDf, id_vars=['iteration'], value_vars=costDataDf.columns[:-1])
plt.style.use('ggplot')
fig,ax = plt.subplots()
n = 20
for i in range(len(alphaList)):
    theta = descentList[i].transpose()
    ax.plot(costValues[i], label=r"$\alpha$" f"={alphaList[i]}")
ax.set_xlim(0, n)
ax.set_ylim(0, 3e13)
ax.set_xlabel("Iterations")
ax.set_ylabel("Cost")
fig.suptitle(f"Cost over the first {n} iterations of gradient descent")
fig.legend();
plt.show()

We can see that the convergence takes longer for small learning rate \(\alpha\), and when \(\alpha\) is too hight, we do not reach convergence and the cost increases.

We can also visualize how our linear regression evolved during gradient descent. If we want to plot our linear regression on our original data, we need to unscale our \(\theat\) coefficients as they were computed using feature scaling. Let’s define a function to do just that.

import copy
def unscaleTheta(coefs, df, orderedFeatureNames):
    n = coefs.shape[1]
    for i in range(len(orderedFeatureNames)):
        feature = orderedFeatureNames[i]
        coefs[:, 0] = coefs[:, 0] - (coefs[:, i+1]*df[feature].mean()/df[feature].std())
        coefs[:, i+1] = coefs[:, i+1]/df[feature].std()
    return coefs
# the function mutates the original coefs so let's deepcopy the original list
descentListUnscaled = copy.deepcopy(descentList)
for i in range(len(descentListUnscaled)):    
    unscaleTheta(descentListUnscaled[i], data, ['sqft'])

Let’s take the example of \(\alpha=0.1\)

3.3.4 Accuracy of our model

Now that gradient descent has converged toward a minimum, we can predict the prices of the houses based on the sizes by using the computed \(\theta\) vector.

The price prediction for each house can be compared to the real price so we get an idea of the accuracy of our trained model. One metric to measure the discrepency between predicted and true values is the \(RMSE\), we can use it to evaluate the accuracy of our model. Let’s define a function to compute the \(RMSE\):

def rmse(actual, prediction):
    n = len(prediction)
    return np.sqrt(np.sum(np.power(prediction - actual, 2))/n)

Let’s compute the \(RMSE\) from our study:

# let's get the theta obtained with alph=0.2
fittedTheta = descentList[2][-1][:, np.newaxis]
# predicted prices for all the houses
predictions = features@fittedTheta
print(f"RMSE: {rmse(trueValues, predictions):.2f}")
## RMSE: 1631312.69

We can also look at the residual plot of our regression analysis to make sure that there is no pattern, randomness and unpredictability are crucial components of any regression model.

fig,ax = plt.subplots()
ax.plot(trueValues-predictions, "o", alpha=0.6)
ax.set_ylabel("True - Predicted price ($)")
ax.set_xlabel("Individual houses")
plt.show()

Next up, let’s see how to apply gradient descent for Multivariate linear regression